Depositor: Lynn M. LoPucki

------------------------------------------------------------
1. Overview
------------------------------------------------------------

This repository contains the data and code necessary to reproduce 
the statistical results reported in the manuscript, including 
Tables 1–3 and associated inferential tests.

The analysis evaluates:

(1) Differences in emissions intensity across industries.
(2) Differences in greenhouse gas reporting rates by firm revenue size.

All inferential statistics reported in the manuscript can be 
reproduced using the files included in this dataset.

------------------------------------------------------------
2. Files Included
------------------------------------------------------------

data.xlsx
    Dataset used to generate all descriptive statistics and
    statistical tests in the manuscript.

analysis.ipynb
    Jupyter notebook containing all code required to reproduce:
        - Emissions intensity calculations
        - Kruskal–Wallis rank-sum test
        - Fisher’s Exact test (Table 3)

README.txt
    This file.

------------------------------------------------------------
3. Data Description
------------------------------------------------------------

Each row represents one company.

Key variables:

Company GHG Name
    Company name.

RevenuesGhgCo
    Company revenues (in millions of U.S. dollars).

Scope1+2Total
    Total Scope 1 + Scope 2 greenhouse gas emissions.

SasbIndustry
    Industry classification.

------------------------------------------------------------
4. Variable Construction
------------------------------------------------------------

Reporting Status

A company is classified as "reporting" if:
    - Scope1+2Total is non-missing, AND
    - RevenuesGhgCo is non-missing and positive.

Emissions Intensity

Emissions intensity is defined as:

    Intensity = (Scope 1 + Scope 2 emissions)
                / (Revenues in millions USD)

Revenues are expressed in millions of dollars.

------------------------------------------------------------
5. Statistical Tests
------------------------------------------------------------

A. Cross-Industry Comparison

A Kruskal–Wallis rank-sum test is used to evaluate whether
emissions intensity distributions differ across industries.

Only industries with at least two reporting firms are included.
The test is nonparametric and does not rely on assumptions of
normality or equal variances.

B. Revenue and Reporting (Table 3)

A two-tailed Fisher’s Exact test is used to evaluate the association
between:

    - Reporting status (reporting vs. non-reporting), and
    - Firm size (revenues < $2 billion vs. >= $2 billion).

The $2 billion threshold corresponds to 2000 in the dataset,
since revenues are expressed in millions.

------------------------------------------------------------
6. Replication Instructions
------------------------------------------------------------

Software requirements:

    Python 3.10+
    pandas
    numpy
    scipy

To reproduce results:

1. Ensure that data.xlsx is located in the same directory as the
   Jupyter notebook, or update the DATA_PATH variable in the code
   to point to the correct file location.

2. Open analysis.ipynb.

3. Run all cells sequentially.

Note: The example path used in the notebook may refer to a specific
environment (e.g., Google Colab). Users running the code locally may
need to adjust the file path to match their directory structure.

All reported statistical results in the manuscript can be
reproduced using the provided data and code.

------------------------------------------------------------
7. Additional Notes
------------------------------------------------------------

- Revenues are expressed in millions of U.S. dollars.
- No external data sources are required.
- No random simulations are used; results are fully deterministic.